Svm Based Improvement in Knn for Text Categorization

نویسنده

  • Seth Jai Prakash
چکیده

ABSTRACTIn today‟s library science, information and computer science, online text classification or text categorization is a huge complication. [1]With the enormous growth of online information and data, text categorization has become one of the crucial techniques for handling and standardizing text data. Various learning algorithms have been applied on text for categorization. On the basis of accuracy and efficiency KNN (K Nearest Neighbour) algorithm prove itself to be very efficient algorithm as compared to other learning algorithms. The framework of KNN with TF-IDF is studied and some changes need to be done for removing time complexity and improve accuracy so, proposed work is based on using SVM classifier which helps in splitting of training and testing data and take less time from the previous work with iKNN (improved KNN) algorithm which gives less time and more accuracy and overall improve text categorization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using kNN Model-based Approach for Automatic Text Categorization

An investigation has been conducted on two well known similarity-based learning approaches to text categorization: the k-nearest neighbor (k-NN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, a new classifier called the kNN model-based classifier (kNNModel) has been proposed. It combines the strength of both k-NN and Rocchio. A text categor...

متن کامل

Hierarchical vs. flat n-gram-based text categorization: Can we do better?

Hierarchical text categorization (HTC) refers to assigning a text document to one or more most suitable categories from a hierarchical category space. In this paper we present two HTC techniques based on kNN and SVM machine learning techniques for categorization process and byte n-gram based document representation. They are fully language independent and do not require any text preprocessing s...

متن کامل

Multiclass Boosting with Adaptive Group-Based kNN and Its Application in Text Categorization

AdaBoost is an excellent committee-based tool for classification. However, its effectiveness and efficiency in multiclass categorization face the challenges from methods based on support vector machine SVM , neural networks NN , naı̈ve Bayes, and k-nearest neighbor kNN . This paper uses a novel multi-class AdaBoost algorithm to avoid reducing the multi-class classification problem to multiple tw...

متن کامل

Text classification: A least square support vector machine approach

This paper presents a least square support vector machine (LS-SVM) that performs text classification of noisy document titles according to different predetermined categories. The system’s potential is demonstrated with a corpus of 91,229 words from University of Denver’s Penrose Library catalogue. The classification accuracy of the proposed LS-SVM based system is found to be over 99.9%. The fin...

متن کامل

A Comparative Study on Chinese Text Categorization Methods

This paper reports our comparative evaluation of three machine learning methods on Chinese text categorization. Whereas a wide range of methods have been applied to English text categorization, relatively few studies have been done on Chinese text categorization. Based on a re-constructed People’s Daily corpus, a series of controlled experiments evaluate three machine learning methods, namely k...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015